Quantifying the impact of homophily and influencer networks on song popularity prediction

Forecasting the popularity of new songs has become a standard practice in the music industry and provides a comparative advantage for those that do it well. Considerable efforts were put into machine learning prediction models for that purpose. It is known that in these models, relevant predictive parameters include intrinsic lyrical and acoustic characteristics, extrinsic factors (e.g., publisher influence and support), and the previous popularity of the artists. Much less attention was given to the social components of the spreading of song popularity. Recently, evidence for musical homophily—the tendency that people who are socially linked also share musical tastes—was reported. Here we determine how musical homophily can be used to predict song popularity. The study is based on an extensive dataset from the last.fm online music platform from which we can extract social links between listeners and their listening patterns. To quantify the importance of networks in the spreading of songs that eventually determines their popularity, we use musical homophily to design a predictive influence parameter and show that its inclusion in state-of-the-art machine learning models enhances predictions of song popularity. The influence parameter improves the prediction precision (TP/(TP \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document}+ FP)) by about 50% from 0.14 to 0.21, indicating that the social component in the spreading of music plays at least as significant a role as the artist’s popularity or the impact of the genre.


Quantifying the impact of homophily and influencer networks on song popularity prediction
Niklas Reisz 1 , Vito D. P. Servedio 1 & Stefan Thurner 1,2,3* Forecasting the popularity of new songs has become a standard practice in the music industry and provides a comparative advantage for those that do it well.Considerable efforts were put into machine learning prediction models for that purpose.It is known that in these models, relevant predictive parameters include intrinsic lyrical and acoustic characteristics, extrinsic factors (e.g., publisher influence and support), and the previous popularity of the artists.Much less attention was given to the social components of the spreading of song popularity.Recently, evidence for musical homophily-the tendency that people who are socially linked also share musical tastes-was reported.
Here we determine how musical homophily can be used to predict song popularity.The study is based on an extensive dataset from the last.fmonline music platform from which we can extract social links between listeners and their listening patterns.To quantify the importance of networks in the spreading of songs that eventually determines their popularity, we use musical homophily to design a predictive influence parameter and show that its inclusion in state-of-the-art machine learning models enhances predictions of song popularity.The influence parameter improves the prediction precision (TP/(TP + FP)) by about 50% from 0.14 to 0.21, indicating that the social component in the spreading of music plays at least as significant a role as the artist's popularity or the impact of the genre.
While you read this paper's abstract, more than 50 new songs were released worldwide.Global music production has long reached such high levels that it is no longer possible to listen to every new song.On the world's largest music streaming platform Spotify, roughly 136 days worth of music are published daily 1,2 .This means that more music is produced in a year than one could listen to in an entire lifetime.In this ocean of new songs, there is a growing need for efficient selection and filtering, and the markets for attention have become increasingly competitive.This becomes apparent in increasingly imbalanced distributions of song popularity.While the majority of songs receive little to no attention, a small fraction become hits that are listened to billions of times 3 .
Predicting which songs have the potential to become one of these hits has become an increasingly important task for music publishers 4 .The issue has sparked substantial commercial interest as publishers focus their marketing activities on high-potential songs and artists 5 .For quantitative predictions, a multitude of different approaches has been explored.The vast volume of data made available by streaming platforms in the past decade triggered a boost of data-driven approaches.
In the early 2000s, a discussion started on the possibility of quantitatively predicting song popularity.Using traditional machine learning classifiers on lyrical and acoustic song attributes, it was attempted to identify hits prospectively 6 .It was concluded that predictions were significantly better than random, with the lyrical features slightly outperforming the acoustic ones.In 7 , it was claimed that the science of predicting hits, the so-called hit song science, was not yet a science as they demonstrated that the usual machine learning methods were not able to forecast the success of songs.This assertion was soon challenged in 8 , where the authors argued that, given a relevant set of acoustic song features, forecasting song popularity was possible with more specific machine learning approaches.In these early works, the emphasis was primarily on intrinsic acoustic and lyrics features.Later, other studies improved outcomes with more sophisticated methods and expansive datasets.Specific song attributes such as "happiness", "partyness" and repetitiveness were found to increase the chances of songs becoming hits 9,10 .Deep convolutional structures and machine learning regressions were convincingly shown to have predictive power [11][12][13] .
While these studies focused exclusively on intrinsic properties of songs, newer studies have included extrinsic features such as metadata or artist popularity.It was claimed that extrinsic factors often carry increased predicting power, the most predictive feature being the previous popularity of the artist 5 .Askin and Mauskapf and Shin and Park 14,15 confirmed this result by finding that both, the previous popularity of the artist and the support of the publisher, have a significant impact on the success chances of songs.In addition to having increased chances of making it into the charts, Im et al. 16 found that songs of well-known artists and big publishers also stay there longer.Reference 17 demonstrated a significant dominance of superstars in the market share and identified a positive correlation between song success and the artist's prior release count.For extrinsic song attributes, Yu et al. 18 found that Support Vector Regression (SVR) performs slightly better than neural networks.
Predictive indicators were also found in Twitter data 19 where music-related hashtags such as #nowplaying were counted.The number of daily tweets about specific songs and artists is highly correlated with the listening trend of that song.Counting the same tags, Tsiara et al. 20 found a moderate correlation between the number of tweets and the chart position of a song, as well as a weak correlation with the sentiment of the tweets.These Twitter-based correlations hint at a social component to the success of songs and that this information might be encoded in social networks.A similar thought was followed in 21 , where the spread of song popularity was modeled with a SIR disease spreading model.There, it is argued that social processes that lead to the spread of music are similar to those of disease spreading.For some genres, for which social connectivity is higher, songs appear to spread faster.
The fact that social networks co-create homophily-the tendency that people that share common treats are more likely to form social ties-has been observed in a variety of contexts, ranging from obesity 22 to performance in schools 23 .Also, in music listening behavior, the concept of homophily has been studied 24 .Recently, homophily's importance in relation to music preference was confirmed in a study on 1144 early adolescents, where music preference plays a significant role in selecting friends 25 .The online music-listening platform and social network last.fmhttps:// www.last.fm/ offers an excellent ground for studying homophily as it simultaneously provides access to both social links and music listening records of users 26 .In 27 , the authors study last.fmdata to predict friendship links.They find that music preference alone rarely leads to friendships.In most cases, sharing friends is the best predictor of future friendships, i.e. triadic closure that has been predicted in 28 and explains basic structures of social multilayer networks 29 .Reference 30 confirmed that people with similar music preferences tend to cluster, indicating that friends tend to listen to similar music.Homophily in music listening was found in explorative behavior 31 .By estimating how mainstream, novel, or diverse listening records of users are, they find that highly explorative people tend to look for friends that are similar.Reference 32 confirmed that result and designed a model that explains the finding, stressing that highly explorative users tend to be friends with users with similar discovery rates.While 31 use homophily to predict friendship links on last.fm, in 33,34 it is investigated how information about new artists spreads through social links and quantified to which extent user behavior is copied.It is demonstrated how homophily can be leveraged to improve song recommendations by recommending new songs to users shortly after close friends are listening to them.
While extensive research has focused on acoustic and metadata-based factors for predicting song popularity, the influence of social interactions has received comparatively less attention, despite its significant impact on listening behavior.In this study, we quantify the influence of social interactions, particularly homophily and measures derived from homophily, on song popularity.Our work reveals how homophily is self-reinforced as users influence their friends to engage with specific songs, leading to the emergence and popularity of new songs through cascades of recommendations.We quantify this user influence through state-of-the-art machinelearning predictions of song popularity, assessing the extent to which our predictions can be enhanced.Our approach seeks to harness the intertwined dynamics of user homophily and influence, rather than disentangling these closely related factors, with the ultimate goal of optimizing prediction accuracy in the context of music recommendation systems.

Homophily
To estimate homophily, we derived a dataset from the online music-listening platform and social network last.fm.Our data includes 300 million individual listening events of 10 million songs.It is enriched by the (undirected and unweighted) friendship network, where nodes represent users and links represent bidirectional friendships.It consists of 2.7 million nodes and 15 million links with an average degree of 11.6.The network is connected with a diameter of 6.This dataset allows us to link friendships to music-listening histories.For more details, see methods.
We first determine the music preferences of every user.Music preferences can be assessed in last.fmthrough user-defined tags.Users define and add tags such as Rock, 80s, or Acoustic to songs, artists, or albums.Here, we focus on artist tags because they are more abundant than song or album tags and offer more reliable statistics.Every artist, a, can have multiple tags, t, which are represented as weights, w at , ranging between 1 and 100, based on their frequency of appearance (100 corresponds to the most frequent tag for that artist).We then compute a music preference matrix, m ut , by summing over the weights, w at , for each tag, t, of each artist, a, for each time a user, u, is listening to a song by that artist.The music preference matrix, M, with elements m u,t is hence defined as where l us is the number of times user, u, listened to song, s. i sa = 1 if artist, a, interpreted song, s, and is zero otherwise.In the music preference matrix M, each row corresponds to a vector of music preference of one (1) m ut = a,s l us i sa w at , www.nature.com/scientificreports/user, expressed by the weighted artist tags.To compare the music preference vectors of users i and j, we use the cosine-similarity, defined as The idea will be to compare the alignment of music preference vectors for users that are friends to the alignment between randomly picked users (that are very likely not friends).

Influencer network
As a next step, we define the influencer network.Influence is the number of times a user's friends copied the listening behavior of that user within a specified time window.We define it algorithmically.Whenever a friend, u j , of a user, u i , listens to a song for the first time at time t j shortly after user, u i , listened to that song at time t i , the influence, I ij , of user, i, on user, j, increases by one if t j − t i < � is smaller than some threshold .A sketch of the concept and the derived influencer network are shown in Fig. 1.The influencer network (orange) is a directed network where users are nodes, and a weighted link represents the strength of a user's influence over another.In our case, it consists of 9200 nodes and 190,000 links, with an average degree of 21.The influencer network is connected and has a diameter of 11.Since friendship is a requirement for influence, the influencer network is a sub-network of the friendship network (blue), all nodes and links present in the influencer network also exist in the friendship network.Influence on last.fm is possible because users can see which songs their friends are listening to and can give specific song recommendations to them. (2) .

Prediction strategy
To simplify the prediction task of the future success of a song, we only classify songs into average songs and future hit songs.We define hit songs as songs with at least 1000 listenings in any one month, which corresponds to the top 1% of songs.We do this classification based on an initial sample of features or parameters derived from the first 200 times a new song has been listened to.That is, when calculating a parameter associated with user attributes for a song, we first compute this value for all users contributing to the first 200 listenings of the song, and subsequently, we obtain the song's predictive parameter by averaging these values.We define three classes of predictive parameters: preferential-attachment-based, time-based, and homophilybased.The preferential-attachment-based parameters are (i) the previous popularity of the artist and (ii) the genre's popularity.The time-based parameters are (iii) the time needed to reach 200 listenings, (iv) the tendency of users to re-listen to a song, (v) the number of users that listen to the song at least twice within a week, and (vi) the temporal trend quantified by the area under the curve of the cumulative listening count as a function of time.We also include two variations of these parameters, (vii) a normalized variation of (vi), where we take the average y-value instead of the area, and (viii) a variation of (vi) where we subtract the y-value from the diagonal and then compute the area.These constitute the basic eight parameters of the model.In addition to these parameters, we define homophily-based parameters and compute them for every song.For these, we identify the users that contributed to the first 200 times a song has been listened to.We use (ix-xi) the influence scores for three different time windows, as defined above, as well as (xii) the average cosine similarity between users and their friends as homophily-based parameters.Finally, we compute (xiii) the degree, (xiv) the PageRank, (xv) the nearest neighbor degree, and (xvi) the clustering coefficient both on the friendship network and the influencer network (xvii-xx).All parameters are described in detail in the methods section.We use these parameters as input for a machine learning ensemble to predict hit songs.
To determine whether the prediction of hit songs is best approached as a classification or regression task, we also explore this problem as a regression task, aiming to predict the exact number of listenings each song will receive.To establish comparability, we mapped the predicted listenings back to a binary classification of hit songs or average songs, using the 1000-listening threshold for hit songs.More details on this comparative approach can be found in the SI text C.

Influencer network
Analyzing the triad statistics of the influencer networks, we find high levels of reciprocity in the influencer network, see Fig. 2.This is seen by the fact that triangles that include reciprocal links are strongly over-represented (blue bars) with respect to a random graph with the same number of nodes and (directed) links (shuffled network, configuration model).Triangles with no reciprocity are under-represented (red).This finding indicates that, generally, influence goes in both directions.Users who discover a new song from a friend (getting influenced) often influence other friends into listening to the song, leading to cascades, such as shown in Fig. 1c.In the SI text A, we identify the "influencers" and "followers" by comparing the number of times users got influenced with how often they influenced others.Overall, we find that 32.5% of new song listens on last.fmqualify as being influenced by friends.

Homophily results
We confirm the presence of strong homophily among users that are friends on last.fm by comparing the musical preferences of 1000 randomly picked users to those of their friends.We find an average cosine-similarity of musical-preference vectors m u of cos(θ) = 0.58 .In contrast, when comparing the musical preference vectors to an equal number of randomly picked users, the average cosine-similarity drops to cos(θ) = 0.25 .A histogram of the respective cosine-similarities is shown in Fig. 3.The t-test for independent samples has a test statistic of 32 and a p-value of p ≤ 10 −200 .If the entries of the music preference vectors of the randomly picked users are shuffled in a random fashion, similarity decreases to levels below 10 −4 and essentially disappears.
We find that both the user's influence and the tendency to get influenced correlate with the average similarity in musical preferences.When comparing users to their friends, this correlation is highly significant with a p ≤ 10 −4 .In contrast, when comparing random users, we do not find any correlation.For more details, see SI text B.

Improving predictions
Both the homophily score and the influence score correlate with song popularity.Parameters that correlate with popularity can be tested if they also predict it.The Pearson correlation coefficients, r, between all parameters and the song popularity are found in Table 1.The strongest (anti) correlation for song popularity we find for the area under the temporal listening trend with r = −0.22 .Other time-based parameters have comparable correlations.The influence score correlates better ( r = 0.17 ) with song popularity than the previous popularity of the artist ( r = 0.14 ).Also, influencer network based parameters correlate similarly as artist popularity.
Table 2 shows the prediction results for three different models quantified by accuracy, precision, and recall.We predict if a song becomes a hit-song or an average song based on information from the 200 first listenings and the structure of the influencer and friendship networks.A hit song is defined as a song that is listened to at least 1000 times in its best month, putting it approximately in the top 1% of all songs.For the classification task, we use a machine learning ensemble including classifiers based on Support Vector Classification, Random Forest, Ada Boost, Gradient Boost, K-Neighbors, and a multilayer perceptron neural network, see Methods.Based on the input parameters, the machine learning models try to classify each song into one of two classes: hit songs or regular songs.In the first model, which we refer to as the combined model (a), we use the combined parameter sets with preferential-attachment-, time-, and homophily-based parameters (i-xx).In the second, the social networks-only scenario (b), we use the homophily-based parameters (ix-xx) only.In the third, we combine the preferential-attachment and time-based parameters (i-viii) without using the homophily-based parameters and refer to it as the baseline model (c).The baseline model is based on commonly used parameters from the literature and excludes any social-network-based parameters.It forms the baseline against which we want to compare the results of models (a) and (b).

Table 1.
Pearson correlation coefficients between prediction parameters and song popularity.Bold numbers show the strongest correlations for homophily-based and baseline parameters.The prediction parameters were computed on the first 200 listenings for each song.Song popularity was defined as the maximum number of times a song was listened to last.fm in its best month.All p-values are below 10 −200 .Only the best-performing variations of parameters are listed here.All other variations can be found in the "Methods" section.The strongest result is the improvement of hit-precision by 50% in the combined model (a) when compared to the baseline model (c).In addition, non-hit recall and accuracy improved by one percentage point at an already high level.On the downside, hit-recall decreased by around 14%. Adding the homophily-based influence score as a predictive parameter significantly improves accuracy and precision, as well as non-hit recall, at the cost of hit-recall.This suggests that the combined model (a) is able to highly reduce the number of false positives at the cost of missing a small number of true positives.The homophily-based model (b) on its own manages to already identify 40% of all hit songs correctly and reaches an accuracy of 95%.We see that while conventional models like our baseline model (c) are superior to the homophily-based model (b), the homophily-based model performs already on a high level, and a combination of both leads to the best results.In our evaluation, t-tests comparing the results of the three different models yielded p-values below 10 −4 , indicating statistically significant differences in performance, while all variances were in the range between 10 −4 and 10 −5 .
When comparing the classification and regression approaches, we do not observe statistically significant differences in the averages of accuracy, precision, or recall.This suggests that both methods are equally viable, and the choice between depends on the task at hand.For more details; see SI text C.

Discussion
We demonstrated how social interactions can be used to enhance song popularity predictions using a large dataset collected from the online platform last.fm.From an influencer network we derive an influence score for every user that captures their tendency to influence others to imitate their listening behavior.Based on a small sample of first listenings, we compute several metrics on the influencer network and use these as the predictive parameters in a machine learning classifier ensemble to categorize songs into potential hits and average songs.Our model exhibits up to 50% improved precision over a baseline model that uses common preferential attachment and time-based parameters.
The integration of social network-based parameters empowers our model to prioritize songs with a heightened likelihood of achieving rapid popularity in the current music landscape.This emphasis on the potential for swift growth enhances the model's precision by making its predictions more selective and precise.However, the model's slightly decreased recall may be attributed to its reduced sensitivity to songs that eventually become hits but do not exhibit strong social network signals during the prediction window.Some songs may take time to gain traction or might follow unconventional paths to success.The model's emphasis on current trends and interactions might lead it to overlook these less conventional hit trajectories, causing a decrease in recall.This trade-off between precision and recall has significant implications for the music industry and those seeking to invest in potential hit songs.The heightened precision of the new model provides more confidence in the songs it does predict as hits, thereby enhancing the decision-making process for investors looking to allocate resources to promising music ventures, ultimately leading to more informed and potentially more lucrative investment decisions in the dynamic and competitive music market.
Our analysis indicates that influence plays a significant role in shaping music listening behavior on last.fm, as approximately one-third of new song listens can be attributed to influence.Results from 33 suggest that the spread of music listening behavior occurs on relatively short time scales, with influence having a stronger effect in the first day, but decreasing over subsequent days and weeks.The used concept of influencing is closely related to homophily.Using the music preference vectors we estimate the similarity of tastes and find that people that are related through friendship links tend to have aligned tastes -i.e.strong homophily is confirmed, Table 2. Classification results for three different cases: (a) the combined model (preferential attachment, time, and homophily), (b) the model with only homophily-based social network parameters, and (c) the baseline model without explicit social network information.We see that a combined model (a) performs best, with the highest accuracy, hit precision, and non-hit recall.The homophily-based model (b) performs well on its own but is outclassed by the baseline model.www.nature.com/scientificreports/consistent with earlier findings of 30,33,36,37 .Influence represents the degree to which users can make their friends listening to the same music.This increases the similarity in the music preference vectors and drives homophily through adaptation.Research by 27,31 suggests that the main driver of homophily in music listening behavior is this adaptation of users' music listening profiles to be more similar to those of their friends, rather than changes in their friendship network.In that regard, last.fmdiffers from other systems such as for instance academic performance, where changes in network connections are more prevalent 23 .Additionally, we show that while some people tend to pioneer new music tastes and others tend to follow these pioneers, in most cases, influencer interactions go in both directions.This means that users that get influenced by their friends can in turn influence multiple other friends, leading to network structures that enable cascading spreading of new songs, fueling the new song's popularity.

Model
Influence between users contributes to preferential attachment.The more people listen to a song, the more likely it gets recommended to friends, thus the probability of it being listened to increases.Parameters that relate to either preferential attachment, such as the previous popularity of the artist, or to forgetting, such as the listening trend, have been widely used in previous works to predict the popularity of songs 5,[14][15][16] .In this study, we focus on these extrinsic properties and contribute new parameters that have the potential to improve the performance of existing models.
There are several severe limitations.Obviously, last.fmonly provides a partial view of what is going on in music listening and recommendations.There are many other channels where people listen to music and exchange information about new songs which leads to a "cold-start" problem: a song that is new on the last.fmplatform doesn't need to be new to its users.Hence, song popularity predictions might be offset by events that are beyond the scope of the available data.This is in part related also to the issue of comparability.A multitude of different datasets is used across the literature, each of them limited in some aspect, for instance, to a specific platform or a geographic region 5,9,11,12 .This is coupled with an apparent lack of a consistent definition of hit songs.In addition to that, in models with many parameters and hyperparameters, performance might be optimized with regard to different metrics.Altogether, specific results are difficult to compare across different studies.For this reason, we chose to use our own baseline model as a benchmark and aim for precise and intuitive definitions of popularity and hit songs, such that different machine learning models might be compared in a consistent way.In order to provide a realistic point of comparison, we aimed to optimize the performance of the aforementioned baseline model.Ultimately, the best-performing model we identified was an ensemble model.Our objective was to demonstrate that even the most effective models can be improved by incorporating social network information.Alternatively, if interpretability is the primary focus, one may choose to conduct the analysis using a single model, such as a random forest model.
In addition to these limitations, to some degree, popularity predictions might be self-fulfilling prophecies.It has been observed that people tend to reproduce perceived song popularity that is presented to them, even if these have been heavily modified [38][39][40] .Finally, there is also an ongoing discussion on whether artists should focus on improving popularity over other goals.Most popular doesn't necessarily imply most enjoyable or most relevant 41 .Rather, it indicates high commercial relevance, which might not necessarily be the top priority.However, even given these shortcomings, in this work, we were able to compare the predictions within a closed framework.Our main result is that we are able to find a substantial relative increase in predictability by including social information.The concept presented here can straightforwardly be transferred to other domains such as movies, books, posts, or even physical goods.Given the growing availability of social network data, comparable approaches might further uncover the social underpinnings of consumer behavior in our society.

Data
Our analysis is based on data collected from last.fm.Table 3 shows the number of songs, listenings, albums, artists, and users in the dataset that we collected for this study.Users are further broken down into users for which the full friendship data i.e. the account names of all their friends are known and users for which we collected the full listen history.Figure 4 shows the timeline of last.fm and marks the time periods for which data was fetched.The bootstrap phase was used to bootstrap the popularity of artists.

Data collection
For data acquisition, we employed the Python interface pylast 42 and initiated crawling by selecting seed users from online communities.The crawl commenced by querying for the friends of these selected users and expanded to their friends' friends and so forth, using a breadth-first search strategy.This approach aimed to establish a core set of users with complete friend lists as well as second-order friends, providing a comprehensive understanding of their links in the friendship network.While we captured a significant portion of the friendship network, including all friendship links of 100,000 users, we selectively collected listening histories due to the high number of required API requests (averaging approximately 20,000 listenings per user).This focused data collection involved a core set of 17,500 users, striking a balance between computational efficiency and maintaining a thorough representation of the listening history and friendship network.

Data pre-processing
In preprocessing Last.fm tags, we strategically addressed the inherent noise and potential inaccuracies associated with user-generated tags.Instead of relying solely on song tags, which can be prone to errors, we opted for a more robust approach by utilizing artist tags.This decision leveraged the much larger number of user votes for artists, enhancing the reliability of the tags used in our analysis.Last.fm utilizes a weighted tagging system, where the influence of tags is determined by the number of votes received, enabling us to mitigate the impact of potentially incorrect or less popular tags.To enhance the quality of our dataset, we introduced a manual blacklist for tags deemed misleading, harmful, or derogatory.This step ensured the exclusion of specific tags, irrespective of their popularity, contributing to a more accurate representation of user preferences.Additionally, our database incorporates merging rules to address instances where identical items might appear separately due to variations in spelling or the addition of extraneous information.The Python scripts for downloading, cleaning, and merging the data are accessible on the GitHub repository at https:// github.com/ Nikla sRz/ lastfm-data-downl oader.

Prediction parameters
We use the data to train a prediction model that is based on several social-and metadata-based parameters.For the social parameters, we use two networks: the friendship network and the influencer network.The friendship network of last.fmconsists of nodes that represent users and links that represent bidirectional friendships.The influencer network is a directed network where users are nodes, and a weighted link represents the strength of a user's influence over another.On these two networks, we define the following user metrics: (ix-xi) user influence with time windows of 6h, 12h and 24h.The influence score of each user is divided by the number of songs where a user influenced another user thus giving us the number of influencing events per song.This is equivalent to the out-degree of the user in the influencer network divided by the number of songs.(xii) homophily as defined here by the average cosine similarity of the music preference vectors between a user and their friends (xiii) the degree of each user, computed on the friendship network (xiv) the PageRank of each user (node), computed on the friendship network (xv) the nearest neighbor degree of each user, computed on the friendship network (xvi) the clustering coefficient of each user, computed on the friendship network (xvii) the degree of each user, computed on the influencer network (xviii) the PageRank of each user, computed on the influencer network (xix) the nearest neighbor degree of each user, computed on the influencer network (xx) the clustering coefficient of each user, computed on the influencer network Each of these user metrics is computed for the users that contribute the first 200 listenings to a song and averaged per song.The average is then the value of the predictive parameter for that song.

Figure 1 .
Figure 1.(a) Schematic depiction of user influence.If a user listens to a song at time t 1 and a friend of that user listens to the same song for the first time at time t 2 with t 2 − t 1 < � smaller than some threshold , the second user is said to have been influenced by the first with respect to that song.The influencer network is shown in orange, the friendship network in blue.The influencer network is a sub-network of the friendship network.(b) A fraction of the influencer network of last.fmfrom a 24 h time window.Several strong influencers are visible (influencers and influencing links in orange) within the friendship network (blue).(c) Example of a timeline of a single song.Dots along the x-axis represent different users that listen to a song for the first time (location of dot).Dots are ordered in time, from left to right.Blue dots are users that found the song on their own, while orange dots represent users that were influenced by their friends.Arrows mark influencing events, where one user influences another into listening to a song.Panel (b) of this figure was created using the open source tool Gephy v0.10 https:// gephi.org/.

Figure 2 .
Figure 2. Triadic census of the influencer network.Comparison to a shuffled network (configuration model), averaged over 100 random shufflings.Triad names follow the convention of 35 .

Figure 3 .
Figure 3. Cosine similarity distribution for users and their friends (blue), and between random users (red) that are typically not befriended.From the distribution, it becomes apparent that very similar people are almost certainly friends on last.fm.. https://doi.org/10.1038/s41598-024-58969-w

Figure 4 .
Figure 4. last.fmtimeline and data availability.The company was founded in 2002 in the UK.User data is available on the API starting from February 2005.In 2014 there was a major change to the system-users were not able anymore to listen to music directly on last.fmbut rather could connect their last.fmaccount to other streaming services such as Spotify.

Table 3 .
Available data that was collected on the last.fmfor the purpose of this study.Numbers are rounded down.

Table 4 .
Pearson correlation coefficients between prediction parameters and song popularity.The prediction parameters were computed on the first 200 listenings for each song.Song popularity was defined as the maximum number of times a song was listened to last.fm in its best month.All p-values are below 10 −50 .